Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 8.432
Filtrar
1.
J Acoust Soc Am ; 155(4): 2627-2635, 2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38629884

RESUMO

Passive acoustic monitoring (PAM) is an optimal method for detecting and monitoring cetaceans as they frequently produce sound while underwater. Cue counting, counting acoustic cues of deep-diving cetaceans instead of animals, is an alternative method for density estimation, but requires an average cue production rate to convert cue density to animal density. Limited information about click rates exists for sperm whales in the central North Pacific Ocean. In the absence of acoustic tag data, we used towed hydrophone array data to calculate the first sperm whale click rates from this region and examined their variability based on click type, location, distance of whales from the array, and group size estimated by visual observers. Our findings show click type to be the most important variable, with groups that include codas yielding the highest click rates. We also found a positive relationship between group size and click detection rates that may be useful for acoustic predictions of group size in future studies. Echolocation clicks detected using PAM methods are often the only indicator of deep-diving cetacean presence. Understanding the factors affecting their click rates provides important information for acoustic density estimation.


Assuntos
Ecolocação , Cachalote , Animais , Vocalização Animal , Acústica , Baleias , Espectrografia do Som
2.
PLoS One ; 19(4): e0299250, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38635752

RESUMO

Passive acoustic monitoring has improved our understanding of vocalizing organisms in remote habitats and during all weather conditions. Many vocally active species are highly mobile, and their populations overlap. However, distinct vocalizations allow the tracking and discrimination of individuals or populations. Using signature whistles, the individually distinct calls of bottlenose dolphins, we calculated a minimum abundance of individuals, characterized and compared signature whistles from five locations, and determined reoccurrences of individuals throughout the Mid-Atlantic Bight and Chesapeake Bay, USA. We identified 1,888 signature whistles in which the duration, number of extrema, start, end, and minimum frequencies of signature whistles varied significantly by site. All characteristics of signature whistles were deemed important for determining from which site the whistle originated and due to the distinct signature whistle characteristics and lack of spatial mixing of the dolphins detected at the Offshore site, we suspect that these dolphins are of a different population than those at the Coastal and Bay sites. Signature whistles were also found to be shorter when sound levels were higher. Using only the passively recorded vocalizations of this marine top predator, we obtained information about its population and how it is affected by ambient sound levels, which will increase as offshore wind energy is developed. In this rapidly developing area, these calls offer critical management insights for this protected species.


Assuntos
Golfinho Nariz-de-Garrafa , Vocalização Animal , Animais , Espectrografia do Som , Ecossistema
3.
Sci Rep ; 14(1): 6062, 2024 03 13.
Artigo em Inglês | MEDLINE | ID: mdl-38480760

RESUMO

With the large increase in human marine activity, our seas have become populated with vessels that can be overheard from distances of even 20 km. Prior investigations showed that such a dense presence of vessels impacts the behaviour of marine animals, and in particular dolphins. While previous explorations were based on a linear observation for changes in the features of dolphin whistles, in this work we examine non-linear responses of bottlenose dolphins (Tursiops Truncatus) to the presence of vessels. We explored the response of dolphins to vessels by continuously recording acoustic data using two long-term acoustic recorders deployed near a shipping lane and a dolphin habitat in Eilat, Israel. Using deep learning methods we detected a large number of 50,000 whistles, which were clustered to associate whistle traces and to characterize their features to discriminate vocalizations of dolphins: both structure and quantities. Using a non-linear classifier, the whistles were categorized into two classes representing the presence or absence of a nearby vessel. Although our database does not show linear observable change in the features of the whistles, we obtained true positive and true negative rates exceeding 90% accuracy on separate, left-out test sets. We argue that this success in classification serves as a statistical proof for a non-linear response of dolphins to the presence of vessels.


Assuntos
Golfinho Nariz-de-Garrafa , Vocalização Animal , Animais , Humanos , Vocalização Animal/fisiologia , Golfinho Nariz-de-Garrafa/fisiologia , Acústica , Oceanos e Mares , Navios , Espectrografia do Som
4.
J Acoust Soc Am ; 155(3): 2050-2064, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38477612

RESUMO

The study of humpback whale song using passive acoustic monitoring devices requires bioacousticians to manually review hours of audio recordings to annotate the signals. To vastly reduce the time of manual annotation through automation, a machine learning model was developed. Convolutional neural networks have made major advances in the previous decade, leading to a wide range of applications, including the detection of frequency modulated vocalizations by cetaceans. A large dataset of over 60 000 audio segments of 4 s length is collected from the North Atlantic and used to fine-tune an existing model for humpback whale song detection in the North Pacific (see Allen, Harvey, Harrell, Jansen, Merkens, Wall, Cattiau, and Oleson (2021). Front. Mar. Sci. 8, 607321). Furthermore, different data augmentation techniques (time-shift, noise augmentation, and masking) are used to artificially increase the variability within the training set. Retraining and augmentation yield F-score values of 0.88 on context window basis and 0.89 on hourly basis with false positive rates of 0.05 on context window basis and 0.01 on hourly basis. If necessary, usage and retraining of the existing model is made convenient by a framework (AcoDet, acoustic detector) built during this project. Combining the tools provided by this framework could save researchers hours of manual annotation time and, thus, accelerate their research.


Assuntos
Jubarte , Animais , Vocalização Animal , Espectrografia do Som , Fatores de Tempo , Estações do Ano , Acústica
5.
J Acoust Soc Am ; 155(2): 1253-1263, 2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38341748

RESUMO

The reassigned spectrogram (RS) has emerged as the most accurate way to infer vocal tract resonances from the acoustic signal [Shadle, Nam, and Whalen (2016). "Comparing measurement errors for formants in synthetic and natural vowels," J. Acoust. Soc. Am. 139(2), 713-727]. To date, validating its accuracy has depended on formant synthesis for ground truth values of these resonances. Synthesis is easily controlled, but it has many intrinsic assumptions that do not necessarily accurately realize the acoustics in the way that physical resonances would. Here, we show that physical models of the vocal tract with derivable resonance values allow a separate approach to the ground truth, with a different range of limitations. Our three-dimensional printed vocal tract models were excited by white noise, allowing an accurate determination of the resonance frequencies. Then, sources with a range of fundamental frequencies were implemented, allowing a direct assessment of whether RS avoided the systematic bias towards the nearest strong harmonic to which other analysis techniques are prone. RS was indeed accurate at fundamental frequencies up to 300 Hz; above that, accuracy was somewhat reduced. Future directions include testing mechanical models with the dimensions of children's vocal tracts and making RS more broadly useful by automating the detection of resonances.


Assuntos
Voz , Criança , Humanos , Acústica , Acústica da Fala , Vibração , Espectrografia do Som
6.
J Acoust Soc Am ; 155(2): 1437-1450, 2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38364047

RESUMO

Odontocetes produce clicks for echolocation and communication. Most odontocetes are thought to produce either broadband (BB) or narrowband high-frequency (NBHF) clicks. Here, we show that the click repertoire of Hector's dolphin (Cephalorhynchus hectori) comprises highly stereotypical NBHF clicks and far more variable broadband clicks, with some that are intermediate between these two categories. Both NBHF and broadband clicks were made in trains, buzzes, and burst-pulses. Most clicks within click trains were typical NBHF clicks, which had a median centroid frequency of 130.3 kHz (median -10 dB bandwidth = 29.8 kHz). Some, however, while having only marginally lower centroid frequency (median = 123.8 kHz), had significant energy below 100 kHz and approximately double the bandwidth (median -10 dB bandwidth = 69.8 kHz); we refer to these as broadband. Broadband clicks in buzzes and burst-pulses had lower median centroid frequencies (120.7 and 121.8 kHz, respectively) compared to NBHF buzzes and burst-pulses (129.5 and 130.3 kHz, respectively). Source levels of NBHF clicks, estimated by using a drone to measure ranges from a single hydrophone and by computing time-of-arrival differences at a vertical hydrophone array, ranged from 116 to 171 dB re 1 µPa at 1 m, whereas source levels of broadband clicks, obtained from array data only, ranged from 138 to 184 dB re 1 µPa at 1 m. Our findings challenge the grouping of toothed whales as either NBHF or broadband species.


Assuntos
Golfinhos , Ecolocação , Animais , Acústica , Vocalização Animal , Espectrografia do Som
7.
J Acoust Soc Am ; 155(1): 274-283, 2024 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-38215217

RESUMO

Echolocating bats and dolphins use biosonar to determine target range, but differences in range discrimination thresholds have been reported for the two species. Whether these differences represent a true difference in their sensory system capability is unknown. Here, the dolphin's range discrimination threshold as a function of absolute range and echo-phase was investigated. Using phantom echoes, the dolphins were trained to echo-inspect two simulated targets and indicate the closer target by pressing a paddle. One target was presented at a time, requiring the dolphin to hold the initial range in memory as they compared it to the second target. Range was simulated by manipulating echo-delay while the received echo levels, relative to the dolphins' clicks, were held constant. Range discrimination thresholds were determined at seven different ranges from 1.75 to 20 m. In contrast to bats, range discrimination thresholds increased from 4 to 75 cm, across the entire ranges tested. To investigate the acoustic features used more directly, discrimination thresholds were determined when the echo was given a random phase shift (±180°). Results for the constant-phase versus the random-phase echo were quantitatively similar, suggesting that dolphins used the envelope of the echo waveform to determine the difference in range.


Assuntos
Golfinho Nariz-de-Garrafa , Quirópteros , Ecolocação , Animais , Acústica , Espectrografia do Som
8.
J Acoust Soc Am ; 155(1): 396-404, 2024 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-38240666

RESUMO

When they are exposed to loud fatiguing sounds in the oceans, marine mammals are susceptible to hearing damage in the form of temporary hearing threshold shifts (TTSs) or permanent hearing threshold shifts. We compared the level-dependent and frequency-dependent susceptibility to TTSs in harbor seals and harbor porpoises, species with different hearing sensitivities in the low- and high-frequency regions. Both species were exposed to 100% duty cycle one-sixth-octave noise bands at frequencies that covered their entire hearing range. In the case of the 6.5 kHz exposure for the harbor seals, a pure tone (continuous wave) was used. TTS was quantified as a function of sound pressure level (SPL) half an octave above the center frequency of the fatiguing sound. The species have different audiograms, but their frequency-specific susceptibility to TTS was more similar. The hearing frequency range in which both species were most susceptible to TTS was 22.5-50 kHz. Furthermore, the frequency ranges were characterized by having similar critical levels (defined as the SPL of the fatiguing sound above which the magnitude of TTS induced as a function of SPL increases more strongly). This standardized between-species comparison indicates that the audiogram is not a good predictor of frequency-dependent susceptibility to TTS.


Assuntos
Phoca , Phocoena , Animais , Estimulação Acústica , Fadiga Auditiva , Espectrografia do Som , Recuperação de Função Fisiológica , Audição , Limiar Auditivo
9.
Audiol., Commun. res ; 29: e2826, 2024. tab, graf
Artigo em Português | LILACS | ID: biblio-1550051

RESUMO

RESUMO Objetivo desenvolver a etapa de validade baseada nos processos de resposta do Protocolo de Análise Espectrográfica da Voz (PAEV). Métodos foram recrutados dez fonoaudiólogos e dez alunos de graduação em Fonoaudiologia, que aplicaram o PAEV em dez espectrogramas, realizaram o julgamento dos itens do PAEV e participaram de uma entrevista cognitiva. A partir das respostas, o PAEV foi reanalisado para reformulação ou para exclusão de itens. Utilizou-se o teste Qui-Quadrado e os valores de acurácia para análise das respostas dos questionários, assim como análise qualitativa dos dados da entrevista cognitiva. Resultados os participantes obtiveram acurácia maior que 70% na maioria dos itens do PAE. Apenas sete itens alcançaram acurácia menor ou igual a 70%. Houve diferença entre as respostas de presença versus ausência de dificuldade na identificação dos itens no espectrograma. A maioria dos participantes não teve dificuldade na identificação dos itens do PAEV. Na entrevista cognitiva, apenas seis itens não obtiveram correta identificação da intenção, conforme verificado na análise qualitativa. Além disso, os participantes sugeriram exclusão de cinco itens. Conclusão após a etapa de validação baseada nos processos de resposta, o PAEV foi reformulado. Sete itens foram excluídos e dois itens foram reformulados. Dessa forma, a versão final do PAEV após essa etapa foi reduzida de 25 para 18 itens, distribuídos nos cinco domínios.


ABSTRACT Purpose To develop the validity step based on the response processes of the Spectrographic Analysis Protocol (SAP). Methods 10 speech therapists and 10 undergraduate students of the Speech Therapy course were recruited, who applied the SAP in 10 spectrograms, performed the evaluation of the PAE items, and participated in a cognitive interview (CI). The SAP was reanalyzed to reformulate or exclude items based on the responses. The chi-square test and the accuracy values were used to analyze the answers to the questionnaires and qualitative analysis of the CI data. Results the participants achieved accuracy > 70% in most items of the SAP. Only seven items achieved accuracy ≤ 70%. There was a difference between presence vs. absence of difficulty in identifying items in the spectrogram. Most participants had no problem identifying the SAP items. In the CI, only six items did not correctly identify the intention, verified in the qualitative analysis. In addition, participants suggested excluding five items. Conclusion After the validation step based on the response processes, the SAP is reformulated. Seven items were deleted, and two items were reformulated. Thus, the final version of the SAP after this stage was reduced from 25 to 18 items, distributed in the five domains.


Assuntos
Humanos , Espectrografia do Som/métodos , Acústica da Fala , Qualidade da Voz , Distúrbios da Voz/diagnóstico por imagem
10.
J Acoust Soc Am ; 154(6): 3672-3683, 2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-38059727

RESUMO

Sound production capabilities and characteristics in Loricariidae, the largest catfish family, have not been well examined. Sounds produced by three loricariid catfish species, Otocinclus affinis, Pterygoplichthys gibbiceps, and Pterygoplichthys pardalis, were recorded. Each of these species produces pulses via pectoral-fin spine stridulation by rubbing the ridged condyle of the dorsal process of the pectoral-fin spine base against a matching groove-like socket in the pectoral girdle. Light and scanning electron microscopy were used to examine the dorsal process of the pectoral-fin spines of these species. Mean distances between dorsal process ridges of O. affinis, P. gibbiceps, and P. pardalis were 53, 161, and 329 µm, respectively. Stridulation sounds occurred during either abduction (type A) or adduction (type B). O. affinis produced sounds through adduction only and P. pardalis through abduction only, whereas P. gibbiceps often produced pulse trains alternating between abduction and adduction. In these species, dominant frequency was an inverse function of sound duration, fish total length, and inter-ridge distance on the dorsal process of the pectoral-fin spine and sound duration increased with fish total length. While stridulation sounds are used in many behavioral contexts in catfishes, the functional significance of sound production in Loricariidae is currently unknown.


Assuntos
Peixes-Gato , Som , Animais , Comunicação Animal , Tamanho Corporal , Espectrografia do Som
11.
Sci Rep ; 13(1): 21771, 2023 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-38065973

RESUMO

Acoustic sequences have been described in a range of species and in varying complexity. Cetaceans are known to produce complex song displays but these are generally limited to mysticetes; little is known about call combinations in odontocetes. Here we investigate call combinations produced by killer whales (Orcinus orca), a highly social and vocal species. Using acoustic recordings from 22 multisensor tags, we use a first order Markov model to show that transitions between call types or subtypes were significantly different from random, with repetitions and specific call combinations occurring more often than expected by chance. The mixed call combinations were composed of two or three calls and were part of three call combination clusters. Call combinations were recorded over several years, from different individuals, and several social clusters. The most common call combination cluster consisted of six call (sub-)types. Although different combinations were generated, there were clear rules regarding which were the first and last call types produced, and combinations were highly stereotyped. Two of the three call combination clusters were produced outside of feeding contexts, but their function remains unclear and further research is required to determine possible functions and whether these combinations could be behaviour- or group-specific.


Assuntos
Orca , Humanos , Animais , Vocalização Animal , Comportamento Social , Islândia , Espectrografia do Som
12.
Artigo em Inglês | MEDLINE | ID: mdl-38083776

RESUMO

Infant cry provides useful clinical insights for caregivers to make appropriate medical decisions, such as in obstetrics. However, robust infant cry detection in real clinical settings (e.g. obstetrics) is still challenging due to the limited training data in this scenario. In this paper, we propose a scene adaption framework (SAF) including two different learning stages that can quickly adapt the cry detection model to a new environment. The first stage uses the acoustic principle that mixture sources in audio signals are approximately additive to imitate the sounds in clinical settings using public datasets. The second stage utilizes mutual learning to mine the shared characteristics of infant cry between the clinical setting and public dataset to adapt the scene in an unsupervised manner. The clinical trial was conducted in Obstetrics, where the crying audios from 200 infants were collected. The experimented four classifiers used for infant cry detection have nearly 30% improvement on the F1-score by using SAF, which achieves similar performance as the supervised learning based on the target setting. SAF is demonstrated to be an effective plug- and-play tool for improving infant cry detection in new clinical settings. Our code is available at https://github.com/contactless-healthcare/Scene-Adaption-for-Infant-Cry-Detection.


Assuntos
Choro , Obstetrícia , Humanos , Lactente , Acústica , Som , Espectrografia do Som
13.
Phonetica ; 80(6): 465-493, 2023 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-37852617

RESUMO

John Ohala claimed that the source of sound change may lie in misperceptions which can be replicated in the laboratory. We tested this claim for a historical change of /t/ to /k/ in the coda in the Southern Min dialect of Chaoshan. We conducted a forced-choice segment identification task with CVC syllables in which the final C varied across the segments [p t k ʔ] in addition to a number of further variables, including the V, which ranged across [i u a]. The results from three groups of participants whose native languages have the coda systems /p t k ʔ/ (Zhangquan), /p k ʔ/ (Chaoshan) and /p t k/ (Dutch) indicate that [t] is the least stably perceived segment overall. It is particularly disfavoured when it follows [a], where there is a bias towards [k]. We argue that this finding supports a perceptual account of the historically documented scenario whereby a change from /at/ to /ak/ preceded and triggered a more general merger of /t/ with /k/ in the coda of Chaoshan. While we grant that perceptual sound changes are not the only or even the most common type of sound change, the fact that the perception results are essentially the same across the three language groups lends credibility to Ohala's perceptually motivated sound changes.


Assuntos
Fonética , Percepção da Fala , Humanos , Idioma , Som , Espectrografia do Som
14.
Anim Cogn ; 26(6): 1915-1927, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37676587

RESUMO

A dolphin's signature whistle (SW) is a distinctive acoustic signal, issued in a bout pattern of unique frequency modulation contours; it allows individuals belonging to a given group to recognize each other and, consequently, to maintain contact and cohesion. The current study is the first scientific evidence that spinner dolphins (Stenella longirostris) produce SWs. Acoustic data were recorded at a shallow rest bay called "Biboca", in Fernando de Noronha Archipelago, Brazil. In total, 1902 whistles were analyzed; 40% (753/1,902) of them were classified as stereotyped whistles (STW). Based on the SIGID method, 63% (472/753) of all STWs were identified as SWs; subsequently, they were categorized into one of 18 SW types. SWs accounted for 25% (472/1,902) of the acoustic repertoire. External observers have shown near perfect agreement to classify whistles into the adopted SW categorization. Most acoustic and temporal variables measured for SWs showed mean values similar to those recorded in other studies with spinner dolphins, whose authors did not differentiate SWs from non-SWs. Principal component analysis has explained 78% of total SW variance, and it emphasized the relevance of shape/contour and frequency variables to SW variance. This scientific discovery helps improving bioacoustics knowledge about the investigated species. Future studies to be conducted in Fernando de Noronha Archipelago should focus on continuous investigations about SW development and use by S. longirostris, expanding individuals' identifications (Photo ID and SW Noronha Catalog), assessing long-term whistle stability and emission rates, and making mother-offspring comparisons with sex-based differences.


Assuntos
Stenella , Animais , Vocalização Animal , Acústica , Brasil , Comportamento Estereotipado , Espectrografia do Som/veterinária
15.
J Acoust Soc Am ; 154(2): 602-618, 2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37535429

RESUMO

Fricatives are obstruent sound contrasts made by airflow constrictions in the vocal tract that produce turbulence across the constriction or at a site downstream from the constriction. Fricatives exhibit significant intra/intersubject and contextual variability. Yet, fricatives are perceived with high accuracy. The current study investigated modeled neural responses to fricatives in the auditory nerve (AN) and inferior colliculus (IC) with the hypothesis that response profiles across populations of neurons provide robust correlates to consonant perception. Stimuli were 270 intervocalic fricatives (10 speakers × 9 fricatives × 3 utterances). Computational model response profiles had characteristic frequencies that were log-spaced from 125 Hz to 8 or 20 kHz to explore the impact of high-frequency responses. Confusion matrices generated by k-nearest-neighbor subspace classifiers were based on the profiles of average rates across characteristic frequencies as feature vectors. Model confusion matrices were compared with published behavioral data. The modeled AN and IC neural responses provided better predictions of behavioral accuracy than the stimulus spectra, and IC showed better accuracy than AN. Behavioral fricative accuracy was explained by modeled neural response profiles, whereas confusions were only partially explained. Extended frequencies improved accuracy based on the model IC, corroborating the importance of extended high frequencies in speech perception.


Assuntos
Fonética , Percepção da Fala , Humanos , Percepção da Fala/fisiologia , Som , Neurônios , Espectrografia do Som
16.
J Acoust Soc Am ; 154(1): 255-269, 2023 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-37449786

RESUMO

Source depth estimation is an important yet very difficult task for passive sonars, especially for horizontal linear arrays (HLAs). This paper proposes an efficient two-step depth estimation scheme using narrowband and broadband constructive and deconstructive striation patterns due to interference between the direct (D) and sea surface reflected (SR) arrivals at an HLA on the bottom of deep water. First, the horizontal source-array ranges are derived from triangulation results of solid angle estimates by subarray beamforming. The applicable areas of the method in deep water are investigated through Mento Carlo simulations, assuming different subarray partitioning ways of a given HLA aperture. Second, cost functions are built to match the measured beam intensity striations with modeled ones. To mitigate the spatial smoothing effect of the beam intensity striations during beamforming, a criterion of the largest subarray aperture is established, and a computationally efficient way is presented to model the replicas by the D-SR time delay templates at a single element of the array calculated by ray theory. The performance degradation due to limited source range spans, the distortion of the beam intensity striations, and range estimation errors has been analyzed. Two experimental datasets verify the effectiveness of the proposed method.


Assuntos
Som , Água , Algoritmos , Espectrografia do Som
17.
Sensors (Basel) ; 23(13)2023 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-37447886

RESUMO

This paper proposes a speech recognition method based on a domain-specific language speech network (DSL-Net) and a confidence decision network (CD-Net). The method involves automatically training a domain-specific dataset, using pre-trained model parameters for migration learning, and obtaining a domain-specific speech model. Importance sampling weights were set for the trained domain-specific speech model, which was then integrated with the trained speech model from the benchmark dataset. This integration automatically expands the lexical content of the model to accommodate the input speech based on the lexicon and language model. The adaptation attempts to address the issue of out-of-vocabulary words that are likely to arise in most realistic scenarios and utilizes external knowledge sources to extend the existing language model. By doing so, the approach enhances the adaptability of the language model in new domains or scenarios and improves the prediction accuracy of the model. For domain-specific vocabulary recognition, a deep fully convolutional neural network (DFCNN) and a candidate temporal classification (CTC)-based approach were employed to achieve effective recognition of domain-specific vocabulary. Furthermore, a confidence-based classifier was added to enhance the accuracy and robustness of the overall approach. In the experiments, the method was tested on a proprietary domain audio dataset and compared with an automatic speech recognition (ASR) system trained on a large-scale dataset. Based on experimental verification, the model achieved an accuracy improvement from 82% to 91% in the medical domain. The inclusion of domain-specific datasets resulted in a 5% to 7% enhancement over the baseline, while the introduction of model confidence further improved the baseline by 3% to 5%. These findings demonstrate the significance of incorporating domain-specific datasets and model confidence in advancing speech recognition technology.


Assuntos
Modelos Teóricos , Redes Neurais de Computação , Interface para o Reconhecimento da Fala , Fala , Percepção da Fala , Conjuntos de Dados como Assunto , Espectrografia do Som
18.
PeerJ ; 11: e15687, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37483973

RESUMO

Long-beaked common dolphin (Delphinus delphis bairdii) distribution is limited to the Eastern North Pacific Ocean. Its whistle repertoire is poorly investigated, with no studies in the Gulf of California. The aim of the present study is to characterize the whistles of this species and compare their parameters with different populations. Acoustic monitoring was conducted in La Paz Bay, Gulf of California. Recordings were inspected in spectrogram view in Raven Pro, selecting good quality whistles (n = 270). In the software Luscinia, contours were manually traced to obtain whistle frequencies and duration. Number of steps, inflection points and contour type were visually determined. We calculated the descriptive statistics of the selected whistle parameters and we compared the results with a dolphins population from the Eastern Pacific Ocean. Permutational multivariate analysis of variance (PERMANOVA) was performed to test the intraspecific variation of the whistle parameters among groups. In the present study the mean values (±SD) of the whistle parameters were: maximum frequency = 14.13 ± 3.71 kHz, minimum frequency = 8.44 ± 2.58 kHz and duration = 0.44 ± 0.31 s. Whistles with the upsweep contour were the most common ones (34.44%). The coefficient of variation (CV) values for modulation parameters were high (>100%), in accordance with other studies on dolphins. Whistle parameters showed significant differences among groups. Finally, ending and maximum frequencies, duration and inflection points of the whistles recorded in the present study were lower compared with the parameters of the long-beaked common dolphins from the Eastern Pacific Ocean. This study provides the first whistle characterization of long-beaked common dolphin from the Gulf of California and it will help future passive acoustic monitoring applications in the study area.


Assuntos
Golfinhos Comuns , Golfinhos , Animais , Baías , Vocalização Animal , Espectrografia do Som/métodos
19.
J Acoust Soc Am ; 154(1): 502-517, 2023 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-37493330

RESUMO

Many odontocetes produce whistles that feature characteristic contour shapes in spectrogram representations of their calls. Automatically extracting the time × frequency tracks of whistle contours has numerous subsequent applications, including species classification, identification, and density estimation. Deep-learning-based methods, which train models using analyst-annotated whistles, offer a promising way to reliably extract whistle contours. However, the application of such methods can be limited by the significant amount of time and labor required for analyst annotation. To overcome this challenge, a technique that learns from automatically generated pseudo-labels has been developed. These annotations are less accurate than those generated by human analysts but more cost-effective to generate. It is shown that standard training methods do not learn effective models from these pseudo-labels. An improved loss function designed to compensate for pseudo-label error that significantly increases whistle extraction performance is introduced. The experiments show that the developed technique performs well when trained with pseudo-labels generated by two different algorithms. Models trained with the generated pseudo-labels can extract whistles with an F1-score (the harmonic mean of precision and recall) of 86.31% and 87.2% for the two sets of pseudo-labels that are considered. This performance is competitive with a model trained with 12 539 expert-annotated whistles (F1-score of 87.47%).


Assuntos
Aprendizado Profundo , Animais , Humanos , Vocalização Animal , Espectrografia do Som , Algoritmos , Baleias
20.
J Acoust Soc Am ; 154(1): 245-254, 2023 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-37439638

RESUMO

The present work focuses on how the landscape and distance between a bird and an audio recording unit affect automatic species identification. Moreover, it is shown that automatic species identification can be improved by taking into account the effects of landscape and distance. The proposed method uses measurements of impulse responses between the sound source and the recorder. These impulse responses, characterizing the effect of a landscape, can be measured in the real environment, after which they can be convolved with any number of recorded bird sounds to modify an existing set of bird sound recordings. The method is demonstrated using autonomous recording units on an open field and in two different types of forests, varying the distance between the sound source and the recorder. Species identification accuracy improves significantly when the landscape and distance effect is taken into account when building the classification model. The method is demonstrated using bird sounds, but the approach is applicable to other animal and non-animal vocalizations as well.


Assuntos
Aves , Vocalização Animal , Animais , Som , Florestas , Espectrografia do Som
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...